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the  most  costly  task,  in  terms  of  both  computer  time  and  storage,  required  by  this  process,  so 
that  improving  the  efficiency  of  such  computations  is  of  critical  importance  for  the  construction 
of  accurate  numerical  models.  Emphasis  has  been  on  sparse  iterative  methods  for 
solving  these  systems,  and  implementation  of  linear  system  solvers  on  parallel 
computers.  Major  results  are  summarized  in  the  final  report. 
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A.  Problem  Statement 

The  objective  of  this  project  was  to  study  numerical  methods  for  solving  sparse  linear  systems 
of  equations  of  the  type  that  arise  from  discretized  partial  differential  equations.  Such  systems 
arise  in  mathematical  models  of  numerous  physical  processes  including  turbulent  flow,  chemical 
reactive  flow,  semiconductor  device  simulation,  and  structural  mechanics.  For  many  such  prob¬ 
lems,  analytic  solutions  are  not  available,  so  the  only  way  of  obtaining  insight  into  the  model 
is  through  numerical  approximation  and  solution.  The  solution  of  sparse  linear  systems  is  often 
the  most  costly  task,  in  terms  of  both  computer  time  and  storage,  required  by  this  process,  so 
that  improving  the  efficiency  of  such  computations  is  of  critical  importance  for  the  construction 
of  accurate  numerical  models.  Our  emphasis  is  on  sparse  iterative  methods  for  solving  these 
systems,  and  implementation  of  linear  system  solvers  on  parallel  computers. 


B.  Summary  of  Major  Results 

Results  have  been  obtained  in  four  general  areas: 

1.  Iterative  methods  for  non-self-adjoint  elliptic  problems.  In  a  series  of  studies,  we  have  de¬ 
veloped,  tested  and  analyzed  iterative  algorithms  for  solving  the  two-dimensional  convection- 
diffusion  equation  discretized  by  finite  differences.  One  class  of  methods  that  has  been  demon¬ 
strated  to  be  successful  for  this  problem  is  line  relaxation  methods,  combined  with  a  reduced 
system  methodology.  Here,  the  discrete  problem  is  first  treated  by  one  step  of  block  Gaussian 
elimination,  which  reduces  the  size  of  the  problem  by  roughly  a  factor  of  two.  The  resulting 
reduced  system  is  then  ordered  using  a  line  ordering  and  solved  by  block  relaxation  methods. 
Analytic  bounds  on  the  rate  of  convergence  of  these  methods  show  them  to  be  rapidly  con¬ 
vergent,  with  rates  that  actually  improve  as  the  amount  of  convection  in  the  model  increases 
(i.e.  as  the  problem  becomes  more  non-self-adjoint).  Extensive  numerical  experiments  on  model 
problems  confirm  the  fast  convergence  predicted  by  the  analysis,  and  they  also  indicate  that  the 
analytic  results  (which  apply  only  to  constant  coefficient  problems)  agree  with  the  behavior  of 
the  methods  on  variable  coefficient  problems,  e.g.  with  turning  points.  Moreover,  experimental 
and  analytic  results  show  the  methods  to  be  be  more  effective  than  analogous  technique®  applied 
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to  the  “unreduced”  system.  Empirical  results  also  identify  effective  ordering  strategies,  and  they 
show  that  acceleration  with  Krylov  subspace  methods  such  as  the  generalized  minimum  residual 
method  can  be  used  to  improve  the  convergence  properties.  This  work  was  done  in  collaboration 
with  G.  H.  Golub  of  Stanford  University  and  G.  Starke  of  the  University  of  Karlsruhe. 

In  a  related  study,  techniques  based  on  preconditioning  by  incomplete  LU  factorization  were 
considered  for  the  discrete  convection-diffusion  equation.  Analytic  results  were  used  to  derive 
incomplete  factorization  preconditioners  that  avoid  the  numerical  instabilities  displayed  when 
more  standard  incomplete  factorization  (ILU  and  MILU)  methods  are  applied  to  this  problem. 
The  new  stabilized  methods  are  adaptive  in  the  sense  that  their  definitions  are  more  closely 
tied  to  values  of  the  of  the  underlying  differential  operator  than  traditional  algebraic  methods  in 
this  class.  The  methods  were  shown  to  be  very  robust  and  much  more  effective  than  standard 
methods  for  solving  a  wide  collection  of  benchmark  problems,  including  examples  with  variable 
coefficients  or  locally  refined  meshes. 

2.  Finite  element  solvers  on  parallel  computers.  We  have  performed  a  study  of  parallel  imple¬ 
mentation  of  a  finite  element  solution  technique  for  second  order  elliptic  differential  equations 
on  a  shared  memory  parallel  computer,  the  Alliant  FX/8.  The  focus  was  on  the  /ip- version  of 
the  finite  element  method,  which  achieves  accuracy  by  using  both  high  order  basis  functions  and 
mesh  refinement.  The  algorithm  used  includes  local  elimination  of  high  order  basis  functions 
with  support  internal  to  elements,  combined  with  preconditioned  conjugate  gradient  solution  of 
unknowns  on  element  interfaces.  Results  indicate  that  the  fully  parallel  computations  (local  as¬ 
sembly  and  elimination)  dominate  the  computational  cost,  and  that  the  global  (i.e.  less  highly 
parallel)  operations,  such  as  preconditioners,  represent  a  small  percentage  of  overall  costs.  New 
results  include  a  clear  picture  of  the  effects  of  architectural  features  such  as  cache  memory  on 
performance;  a  demonstration  of  the  use  of  efficiently  coded  kernels  from  the  BLAS3  library  to 
improve  performance;  a  predictive  analytic  model  of  the  effects  of  synchronization  of  local  op¬ 
erations;  and  a  demonstration  that  high  order  elements  are  an  efficient  means  of  achieving  high 
accuracy.  This  work  was  in  collaboration  with  I.  Babuska,  K.  Markley,  and  D.  K.  Lee  of  the 
University  of  Maryland. 

3.  Multilevel  preconditioners  and  their  implementation  on  large-scale  parallel  computers.  We 
have  performed  several  studies  of  multilevel  preconditioners  for  both  two-dimensional  and  three- 
dimensional  discrete  self-adjoint  elliptic  differential  equations.  An  advantage  of  such  methods  is 
their  fast  convergence,  which  depends  only  logarithmically  on  the  number  of  mesh  points.  New 
results  include  efficient  parallel  algorithms  for  these  techniques,  grid  generation  strategies  that 
relate  multilevel  methods  more  closely  to  underlying  differential  operators,  and  extensive  numer¬ 
ical  experiments  on  a  large  scale  parallel  computer,  the  Connection  Machine  2.  The  enhanced 
algorithms  greatly  improve  the  performance  of  these  methods  for  solving  anisotropic  problems 
and  problems  with  discontinuous  coefficients,  and  experiments  on  the  Connection  Machine  show 
that  they  are  effective  on  parallel  machines  provided  communication  costs  are  not  too  high.  In 
addition,  a  new  algebraic  hierarchical  basis  multigrid  method  has  been  developed  that  has  an 
optimal  rate  of  convergence  (i.e.,  independent  of  the  number  of  mesh  points),  and  displays  faster 
experimental  convergence  than  related  methods  such  as  the  algebraic  multilevel  method.  This 
project  was  performed  in  collaboration  with  X.-Z.  Guo  of  the  University  of  Maryland. 

4.  Parallel  sparse  diiect  solvers.  In  a  study  of  parallel  direct  solution  of  sparse  positive-definite 
linear  systems,  we  compared  the  performance  of  two  versions  of  parallel  sparse  Cholesky  factor- 
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izations,  based  on  two  strategies  of  task  assignment,  dynamic  allocation  and  static  allocation. 
The  first  allocation  strategy  is  commonly  used  on  shared-memory  machines,  where  a  “pool  of 
tasks”  is  accessible  to  all  processors;  the  second  type  of  strategy  is  needed  for  distributed  memory 
machines.  An  advantage  of  static  allocation  strategies  is  that  they  can  be  used  on  either  class 
of  machine.  Our  tests,  on  a  shared-memory  20-processor  Encore  Multimax  computer,  indicate 
that  the  best  dynamic  allocation  strategy  displays  parallel  efficiencies  of  roughly  85%,  compared 
with  efficiencies  of  about  50%  for  the  best  static  strategies  (so  that  the  latter  are  roughly  60% 
as  effective).  Enhancements  of  static  algorithms  designed  to  make  use  of  idle  processors  dramat¬ 
ically  improves  performance  by  reducing  spin-waiting  times.  This  work  was  in  collaboration  of 

G.  Zhang  of  the  University  of  Maryland. 
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